nbinom (Negative binomial distribution)#

The negative binomial distribution models a count (discrete) outcome. In the scipy.stats.nbinom parameterization, it is:

the number of failures (K) observed before achieving n successes, when each trial succeeds with probability p.

Two common interpretations depending on n:

  • If n is a positive integer, this is literally a “repeat Bernoulli trials until n successes” model.

  • If n is a positive real (as allowed by SciPy), n is best thought of as a dispersion/shape parameter via the Gamma–Poisson mixture view.

This notebook uses the same parameterization as scipy.stats.nbinom:

  • n > 0 (integer in the classic counting story; real-valued in many statistical models)

  • p ∈ (0, 1] (success probability)

Learning goals#

By the end you should be able to:

  • write the PMF/CDF and map between common parameterizations

  • derive mean/variance and a usable log-likelihood

  • sample using a NumPy-only algorithm (Gamma–Poisson mixture)

  • visualize PMF/CDF and validate by Monte Carlo

  • use scipy.stats.nbinom for pmf, cdf, rvs, and fitting

import math

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=6, suppress=True)

rng = np.random.default_rng(42)

1) Title & Classification#

Name: nbinom (Negative binomial distribution)
Type: Discrete

Support: [ k \in {0, 1, 2, \dots} ]

Parameter space (SciPy): [ n > 0,\qquad 0 < p \le 1 ]

Interpretation (classic Bernoulli-trial story):

  • n is the number of successes we wait for

  • p is the success probability per trial

  • the random variable (K) counts failures before the n-th success

2) Intuition & Motivation#

What this distribution models#

The negative binomial answers questions like:

  • “How many misses happen before we see n hits?”

  • “How many non-events occur before the n-th event?”

If n=1, you get the geometric distribution (failures before the first success).

Typical real-world use cases#

The negative binomial is a workhorse for count data when the Poisson model is too “tight”. Common examples:

  • number of defects before a fixed number of passes

  • number of insurance claims, incidents, arrivals in a time window

  • word counts in documents (topic models / language)

  • RNA-seq / gene expression counts (overdispersed relative to Poisson)

Relations to other distributions#

Two especially important relationships:

  1. Sum of geometrics (integer n) [ K = G_1 + \cdots + G_n,\qquad G_i\sim ext{Geom}(p) ext{ on }{0,1,2,\dots} ]

  2. Gamma–Poisson mixture (any real n>0) [ \Lambda \sim ext{Gamma}( ext{shape}=n,\ ext{scale}= frac{1-p}{p}),\qquad K\mid\Lambda\sim ext{Poisson}(\Lambda) ] Then marginally (K\sim ext{NB}(n,p)). This view explains why negative binomial is used for overdispersed counts.

A convenient re-parameterization in terms of the mean (\mu) and dispersion (n) is: [ \mu = \mathbb{E}[K],\qquad p = rac{n}{n+\mu},\qquad \mathrm{Var}(K)=\mu+ rac{\mu^2}{n} ] So smaller n means more overdispersion.

3) Formal Definition#

Let (K\sim ext{NB}(n,p)) denote the number of failures before the n-th success.

PMF#

For integers (k\ge 0): [ \mathbb{P}(K=k) = inom{k+n-1}{k}(1-p)^k p^n ]

A numerically convenient equivalent form (valid for real (n>0)) uses gamma functions: [ inom{k+n-1}{k} = rac{\Gamma(k+n)}{\Gamma(n),\Gamma(k+1)} ] so [ \mathbb{P}(K=k) = rac{\Gamma(k+n)}{\Gamma(n),\Gamma(k+1)},(1-p)^k,p^n. ]

CDF#

For real (x), define (\lfloor x floor) as the floor. Then [ F(x)=\mathbb{P}(K\le x)=\sum_{k=0}^{\lfloor x floor}\mathbb{P}(K=k). ]

A standard special-function identity is [ \mathbb{P}(K\le k) = I_{p}(n,\ k+1),\qquad k\in{0,1,2,\dots} ] where (I) is the regularized incomplete beta function.

4) Moments & Properties#

Let (q = 1-p).

Mean and variance#

[ \mathbb{E}[K] = rac{nq}{p},\qquad \mathrm{Var}(K)= rac{nq}{p^2}. ]

Equivalently, if you use the mean (\mu): (\mu= frac{nq}{p}), then [ \mathrm{Var}(K)=\mu+ rac{\mu^2}{n}. ]

Skewness and kurtosis#

[ ext{skew}(K)= rac{2-p}{\sqrt{n(1-p)}} ]

The excess kurtosis is [ ext{excess kurt}(K)= rac{6 + rac{p^2}{1-p}}{n} ] (so the full kurtosis is (3+ ext{excess kurt})).

MGF and characteristic function#

For (t < -\log(1-p)): [ M_K(t)=\mathbb{E}[e^{tK}] = \left( rac{p}{1-(1-p)e^t} ight)^{n}. ]

For real (t): [ arphi_K(t)=\mathbb{E}[e^{itK}] = \left( rac{p}{1-(1-p)e^{it}} ight)^{n}. ]

Entropy#

The (Shannon) entropy is [ H(K) = -\sum_{k=0}^{\infty} \mathbb{P}(K=k),\log \mathbb{P}(K=k). ] There is no simple elementary closed form; in practice you compute it numerically (e.g., via truncation or via SciPy).

Other useful properties#

  • Mode: for (n>1), (\left\lfloor frac{(n-1)(1-p)}{p} ight floor). For (0<n\le 1), the mode is 0.

  • Additivity (same p): if (K_1\sim ext{NB}(n_1,p)) and (K_2\sim ext{NB}(n_2,p)) independent, then (K_1+K_2\sim ext{NB}(n_1+n_2,p)).

  • Poisson limit: with mean (\mu) fixed and (n o\infty), the distribution approaches ( ext{Poisson}(\mu)).

def _validate_n_p(n, p):
    if isinstance(n, bool):
        raise TypeError("n must be a positive number")
    n_float = float(n)
    if not (n_float > 0.0):
        raise ValueError("n must be > 0")

    p_float = float(p)
    if not (0.0 < p_float <= 1.0):
        raise ValueError("p must be in (0, 1]")

    return n_float, p_float


def nbinom_logpmf(k, n, p):
    # Log PMF for NB(n, p) on k=0,1,2,... (SciPy parameterization).
    n, p = _validate_n_p(n, p)

    k_arr = np.asarray(k)
    out = np.full(k_arr.shape, -np.inf, dtype=float)

    k_int = k_arr.astype(int)
    valid = (k_int == k_arr) & (k_int >= 0)
    if not np.any(valid):
        return out

    if p == 1.0:
        out[valid & (k_int == 0)] = 0.0
        return out

    kv = k_int[valid]

    # log Γ(k+n) - log Γ(n) - log(k!)
    log_coeff = (
        np.vectorize(math.lgamma)(kv + n)
        - math.lgamma(n)
        - np.vectorize(math.lgamma)(kv + 1)
    )

    out[valid] = log_coeff + kv * math.log1p(-p) + n * math.log(p)
    return out


def nbinom_pmf(k, n, p):
    return np.exp(nbinom_logpmf(k, n, p))


def nbinom_mean(n, p):
    n, p = _validate_n_p(n, p)
    return n * (1 - p) / p


def nbinom_var(n, p):
    n, p = _validate_n_p(n, p)
    return n * (1 - p) / (p * p)


def nbinom_skew(n, p):
    n, p = _validate_n_p(n, p)
    q = 1 - p
    return (2 - p) / math.sqrt(n * q)


def nbinom_excess_kurt(n, p):
    n, p = _validate_n_p(n, p)
    q = 1 - p
    return (6 + (p * p) / q) / n


def nbinom_pmf_cdf_trunc(n, p, *, q=0.999, max_k=200_000):
    # Return (ks, pmf, cdf) for k=0..K capturing ~q mass via recurrence.
    n, p = _validate_n_p(n, p)

    if not (0.0 < q <= 1.0):
        raise ValueError("q must be in (0, 1]")

    if p == 1.0:
        return np.array([0]), np.array([1.0]), np.array([1.0])

    pmf0 = math.exp(n * math.log(p))
    pmf_vals = [pmf0]
    cdf_vals = [pmf0]

    k = 0
    while cdf_vals[-1] < q and k < max_k:
        k += 1
        # f(k)/f(k-1) = ((k-1+n)/k) * (1-p)
        pmf_k = pmf_vals[-1] * ((k - 1 + n) / k) * (1 - p)
        pmf_vals.append(pmf_k)
        cdf_vals.append(cdf_vals[-1] + pmf_k)

        if pmf_k == 0.0:
            break

    ks = np.arange(len(pmf_vals))
    pmf = np.array(pmf_vals, dtype=float)
    cdf = np.minimum(1.0, np.array(cdf_vals, dtype=float))
    cdf[-1] = min(1.0, cdf[-1])
    return ks, pmf, cdf


def nbinom_entropy_trunc(n, p, *, q=0.999999, max_k=400_000):
    # Approximate entropy in nats via truncation at mass q.
    ks, pmf, _ = nbinom_pmf_cdf_trunc(n, p, q=q, max_k=max_k)

    # Lower bound (ignores tail beyond the truncation point).
    pmf = pmf[pmf > 0]
    return float(-(pmf * np.log(pmf)).sum())
n, p = 8, 0.35

moments = {
    "mean": nbinom_mean(n, p),
    "var": nbinom_var(n, p),
    "skew": nbinom_skew(n, p),
    "excess_kurt": nbinom_excess_kurt(n, p),
    "entropy_trunc_nats": nbinom_entropy_trunc(n, p),
}

moments
{'mean': 14.85714285714286,
 'var': 42.44897959183674,
 'skew': 0.7235728659282991,
 'excess_kurt': 0.7735576923076923,
 'entropy_trunc_nats': 3.2473831980948553}
# Monte Carlo check (matches formulas up to sampling error)
samples = rng.negative_binomial(n=n, p=p, size=200_000)

est_mean = samples.mean()
est_var = samples.var(ddof=0)

{
    "formula_mean": moments["mean"],
    "mc_mean": float(est_mean),
    "formula_var": moments["var"],
    "mc_var": float(est_var),
    "entropy_trunc_nats": moments["entropy_trunc_nats"],
}
{'formula_mean': 14.85714285714286,
 'mc_mean': 14.87479,
 'formula_var': 42.44897959183674,
 'mc_var': 42.5450724559,
 'entropy_trunc_nats': 3.2473831980948553}

5) Parameter Interpretation#

In the SciPy parameterization:

  • p (success probability) controls how quickly successes arrive. Larger p puts more mass near 0 failures.

  • n (shape / number of successes) controls both scale and dispersion. For integer n, it is literally the number of successes you wait for.

Two identities are especially useful: [ \mathbb{E}[K]= rac{n(1-p)}{p},\qquad \mathrm{Var}(K)= rac{n(1-p)}{p^2}. ]

Re-parameterization by mean (\mu) and dispersion (n): [ p = rac{n}{n+\mu},\qquad \mathrm{Var}(K)=\mu+ rac{\mu^2}{n}. ] So with (\mu) fixed:

  • small n ⇒ large variance ⇒ heavy right tail (strong overdispersion)

  • large n ⇒ variance close to (\mu) ⇒ close to Poisson

from plotly.subplots import make_subplots

# Left: vary p with fixed n
n_fixed = 10
p_values = [0.2, 0.4, 0.6, 0.8]

# Right: fix mean mu and vary dispersion n
mu_fixed = 20
n_values = [1, 3, 10, 50]

fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=(
        f"PMF vs p (n={n_fixed})",
        f"PMF vs dispersion n (mean μ={mu_fixed})",
    ),
)

for p_ in p_values:
    ks, pmf, _ = nbinom_pmf_cdf_trunc(n_fixed, p_, q=0.995)
    fig.add_trace(
        go.Scatter(x=ks, y=pmf, mode="markers+lines", name=f"p={p_}"),
        row=1,
        col=1,
    )

for n_ in n_values:
    p_ = n_ / (n_ + mu_fixed)
    ks, pmf, _ = nbinom_pmf_cdf_trunc(n_, p_, q=0.995)
    fig.add_trace(
        go.Scatter(
            x=ks,
            y=pmf,
            mode="markers+lines",
            name=f"n={n_} (p={p_:.3f})",
        ),
        row=1,
        col=2,
    )

fig.update_xaxes(title_text="k", row=1, col=1)
fig.update_yaxes(title_text="P(K=k)", row=1, col=1)
fig.update_xaxes(title_text="k", row=1, col=2)
fig.update_yaxes(title_text="P(K=k)", row=1, col=2)
fig.update_layout(height=430, legend_title_text="")

fig.show()

6) Derivations#

6.1 Expectation#

Using the Gamma–Poisson mixture representation: ( \Lambda\sim ext{Gamma}(n,\ ext{scale}= frac{1-p}{p}) ) and (K\mid\Lambda\sim ext{Poisson}(\Lambda)).

By iterated expectation: [ \mathbb{E}[K] = \mathbb{E}[,\mathbb{E}[K\mid\Lambda] ,] = \mathbb{E}[\Lambda] = n, rac{1-p}{p}. ]

6.2 Variance#

By the law of total variance: [ \mathrm{Var}(K) = \mathbb{E}[\mathrm{Var}(K\mid\Lambda)] + \mathrm{Var}(\mathbb{E}[K\mid\Lambda]). ] For a Poisson, (\mathrm{Var}(K\mid\Lambda)=\Lambda) and (\mathbb{E}[K\mid\Lambda]=\Lambda). Hence [ \mathrm{Var}(K)=\mathbb{E}[\Lambda]+\mathrm{Var}(\Lambda). ] For (\Lambda\sim ext{Gamma}(n, ext{scale}= frac{1-p}{p})): ( \mathbb{E}[\Lambda]=n frac{1-p}{p} ) and ( \mathrm{Var}(\Lambda)=n( frac{1-p}{p})^2 ). So [ \mathrm{Var}(K)= rac{n(1-p)}{p}+ rac{n(1-p)^2}{p^2}= rac{n(1-p)}{p^2}. ]

6.3 Likelihood#

For i.i.d. observations (k_1,\dots,k_m), the log-likelihood is [ \ell(n,p) = \sum_{i=1}^m\Big[ \log\Gamma(k_i+n)-\log\Gamma(n)-\log\Gamma(k_i+1)

  • k_i\log(1-p) + n\log p \Big]. ]

If n is known, the MLE for p has a closed form. Differentiate with respect to p and set to zero: [ \hat p = rac{n}{n+ar k},\qquad ar k = rac{1}{m}\sum_i k_i. ] Estimating both n and p requires numerical optimization (or the mean/dispersion re-parameterization).

# Visualize the likelihood for p with n fixed

def nbinom_loglik(n, p, data):
    return float(nbinom_logpmf(data, n, p).sum())

n_fixed = 8
# Synthetic data (overdispersed counts)
data = rng.negative_binomial(n=n_fixed, p=0.35, size=400)

kbar = data.mean()
p_hat_closed = n_fixed / (n_fixed + kbar)

p_grid = np.linspace(1e-4, 1 - 1e-4, 400)
ll = np.array([nbinom_loglik(n_fixed, p, data) for p in p_grid])

fig = go.Figure()
fig.add_trace(go.Scatter(x=p_grid, y=ll, mode="lines", name="log-likelihood"))
fig.add_vline(x=p_hat_closed, line_dash="dash", annotation_text=f"MLE p≈{p_hat_closed:.3f}")
fig.update_layout(
    title=f"Log-likelihood for p (n fixed at {n_fixed})",
    xaxis_title="p",
    yaxis_title="log L(p)",
    width=850,
    height=420,
)
fig.show()

{"kbar": float(kbar), "p_hat_closed": float(p_hat_closed)}
{'kbar': 14.935, 'p_hat_closed': 0.34881185960322647}

7) Sampling & Simulation#

NumPy-only sampling via the Gamma–Poisson mixture#

Use the hierarchical model:

  1. Sample (\Lambda\sim ext{Gamma}(n, ext{scale}= frac{1-p}{p}))

  2. Sample (K\mid\Lambda\sim ext{Poisson}(\Lambda))

This produces exact samples from ( ext{NB}(n,p)), works for any real n>0, and uses only NumPy RNGs.

Alternative (integer n): sum of geometrics#

If n is an integer, you can sample n independent geometric “failures before success” variables and sum them.

def sample_nbinom_poisson_gamma(n, p, size=1, *, rng: np.random.Generator):
    n, p = _validate_n_p(n, p)
    if p == 1.0:
        return np.zeros(size, dtype=int)

    # Lambda ~ Gamma(shape=n, scale=(1-p)/p)
    lam = rng.gamma(shape=n, scale=(1 - p) / p, size=size)
    return rng.poisson(lam)


def sample_nbinom_geometric_sum(n, p, size=1, *, rng: np.random.Generator):
    n, p = _validate_n_p(n, p)
    if not float(n).is_integer():
        raise ValueError("geometric-sum sampler requires integer n")
    n_int = int(n)

    if p == 1.0:
        return np.zeros(size, dtype=int)

    size_tuple = (size,) if isinstance(size, (int, np.integer)) else tuple(size)

    # NumPy's geometric returns the number of trials until first success (>=1).
    # Failures before success = geometric - 1.
    g = rng.geometric(p, size=size_tuple + (n_int,)) - 1
    return g.sum(axis=-1)


n, p = 7.5, 0.35
x = sample_nbinom_poisson_gamma(n, p, size=200_000, rng=rng)

{
    "theory_mean": nbinom_mean(n, p),
    "mc_mean": float(x.mean()),
    "theory_var": nbinom_var(n, p),
    "mc_var": float(x.var(ddof=0)),
}
{'theory_mean': 13.928571428571429,
 'mc_mean': 13.939795,
 'theory_var': 39.79591836734694,
 'mc_var': 39.908960357975}

8) Visualization#

We’ll visualize:

  • the PMF (truncated to cover most probability mass)

  • the CDF (step function)

  • Monte Carlo samples vs the theoretical PMF

n, p = 10, 0.3

ks, pmf, cdf = nbinom_pmf_cdf_trunc(n, p, q=0.999)

fig_pmf = go.Figure()
fig_pmf.add_trace(go.Bar(x=ks, y=pmf, name="PMF"))
fig_pmf.update_layout(
    title=f"Negative binomial PMF (n={n}, p={p}) — truncated at CDF≈{cdf[-1]:.4f}",
    xaxis_title="k (failures)",
    yaxis_title="P(K=k)",
)
fig_pmf.show()

fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=ks, y=cdf, mode="lines", line_shape="hv", name="CDF"))
fig_cdf.update_layout(
    title=f"Negative binomial CDF (n={n}, p={p})",
    xaxis_title="k",
    yaxis_title="P(K≤k)",
    yaxis=dict(range=[0, 1.02]),
)
fig_cdf.show()

mc = sample_nbinom_poisson_gamma(n, p, size=200_000, rng=rng)
counts = np.bincount(mc, minlength=int(ks[-1]) + 1)[: len(ks)]
pmf_hat = counts / counts.sum()

fig_mc = go.Figure()
fig_mc.add_trace(go.Bar(x=ks, y=pmf_hat, name="Monte Carlo", opacity=0.65))
fig_mc.add_trace(go.Scatter(x=ks, y=pmf, mode="markers+lines", name="Theory"))
fig_mc.update_layout(
    title=f"Monte Carlo vs theory (n={len(mc):,} samples)",
    xaxis_title="k",
    yaxis_title="Probability",
)
fig_mc.show()

9) SciPy Integration#

SciPy provides scipy.stats.nbinom as an rv_discrete distribution.

Common methods:

  • pmf, logpmf

  • cdf, sf, ppf

  • rvs

  • stats (mean/var/skew/excess kurtosis)

  • entropy

Fitting#

rv_discrete objects typically do not expose a .fit(...) method. In modern SciPy, you can fit discrete or continuous distributions with scipy.stats.fit.

Note: for modeling overdispersed counts, many libraries use the mean/dispersion parameterization; always check which convention you are using.

import scipy.stats as stats

n, p = 12, 0.4
rv = stats.nbinom(n, p)  # frozen distribution (loc=0)

ks = np.arange(0, 8)
print("pmf:", rv.pmf(ks))
print("cdf:", rv.cdf(ks))

s = rv.rvs(size=10, random_state=rng)
print("rvs:", s)

mean_sp, var_sp, skew_sp, kurt_sp = rv.stats(moments="mvsk")
print("mean/var/skew/excess_kurt:", float(mean_sp), float(var_sp), float(skew_sp), float(kurt_sp))
print("entropy (nats):", float(rv.entropy()))

# Compare SciPy's skew/kurt to formulas
print("formula skew:", nbinom_skew(n, p))
print("formula excess kurt:", nbinom_excess_kurt(n, p))

# Fit with scipy.stats.fit (estimate n and p, keep loc fixed)
true_n, true_p = 7.5, 0.35
x = stats.nbinom.rvs(true_n, true_p, size=3000, random_state=rng)

fit_res = stats.fit(
    stats.nbinom,
    x,
    bounds={
        "n": (1e-6, 200.0),
        "p": (1e-6, 1 - 1e-6),
        "loc": (0, 0),
    },
)

fit_res
pmf: [0.000017 0.000121 0.000471 0.001319 0.002968 0.005698 0.009687 0.014946]
cdf: [0.000017 0.000138 0.000609 0.001928 0.004896 0.010594 0.020282 0.035228]
rvs: [29 16 17 13  7 17 20 16 18 24]
mean/var/skew/excess_kurt: 17.999999999999996 44.999999999999986 0.5962847939999439 0.5222222222222223
entropy (nats): 3.291640885791076
formula skew: 0.5962847939999439
formula excess kurt: 0.5222222222222223
  params: FitParams(n=8.0, p=0.36794014275523457, loc=0.0)
 success: True
 message: 'Optimization terminated successfully.'

10) Statistical Use Cases#

A) Hypothesis testing: detecting overdispersion vs a Poisson model#

A standard diagnostic for Poisson counts is the dispersion ratio: [ \hat\phi = rac{s^2}{ar y} ] where (ar y) is the sample mean and (s^2) is the sample variance. For a Poisson, (\mathbb{E}[s^2]pprox ar y), so (\hat\phipprox 1). Values significantly larger than 1 indicate overdispersion (a common motivation for a negative binomial model).

B) Bayesian modeling: Gamma–Poisson predictive is negative binomial#

If (Y\mid\lambda\sim ext{Poisson}(\lambda)) and (\lambda\sim ext{Gamma}(lpha, ext{rate}=eta)), then the prior predictive for (Y) is negative binomial. After observing data, the posterior predictive for a new count is also negative binomial.

C) Generative modeling: heterogeneous rates#

If each observation has its own random rate (\Lambda_i) (Gamma-distributed heterogeneity), and counts are Poisson given rates, then the marginal distribution of counts is negative binomial.

from scipy.stats import chi2

# Synthetic example: data generated from a negative binomial (overdispersed)
true_n, true_p = 5.0, 0.25
m = 300

y = stats.nbinom.rvs(true_n, true_p, size=m, random_state=rng)

ybar = y.mean()
s2 = y.var(ddof=1)

# Dispersion test statistic (approximate): (m-1)*s^2 / ybar ~ Chi^2_{m-1} under Poisson
D = (m - 1) * s2 / ybar
p_value_over = 1 - chi2.cdf(D, df=m - 1)

{
    "sample_mean": float(ybar),
    "sample_var": float(s2),
    "dispersion_ratio_var_over_mean": float(s2 / ybar),
    "D": float(D),
    "p_value_overdispersion": float(p_value_over),
}
{'sample_mean': 14.116666666666667,
 'sample_var': 61.39437012263098,
 'dispersion_ratio_var_over_mean': 4.349069902429585,
 'D': 1300.371900826446,
 'p_value_overdispersion': 0.0}
# Bayesian modeling: Gamma-Poisson posterior predictive
# Prior: lambda ~ Gamma(alpha, rate=beta)
alpha, beta = 2.0, 1.0

# Observe Poisson counts (synthetic)
lambda_true = 3.0
obs = stats.poisson.rvs(lambda_true, size=50, random_state=rng)

alpha_post = alpha + obs.sum()
beta_post = beta + len(obs)

# Posterior predictive for a new count:
# If lambda|data ~ Gamma(alpha_post, rate=beta_post), then
# Y_new ~ NB(n=alpha_post, p=beta_post/(beta_post+1)) in SciPy's parameterization.

n_pred = alpha_post
p_pred = beta_post / (beta_post + 1.0)

rv_pred = stats.nbinom(n_pred, p_pred)

ks, pmf, cdf = nbinom_pmf_cdf_trunc(n_pred, p_pred, q=0.999)

fig = go.Figure()
fig.add_trace(go.Bar(x=ks, y=pmf, name="Posterior predictive PMF"))
fig.update_layout(
    title="Posterior predictive for next count (Gamma–Poisson → negative binomial)",
    xaxis_title="k",
    yaxis_title="P(Y_new=k | data)",
)
fig.show()

{
    "alpha_post": float(alpha_post),
    "beta_post": float(beta_post),
    "predictive_n": float(n_pred),
    "predictive_p": float(p_pred),
    "predictive_mean": float(rv_pred.mean()),
}
{'alpha_post': 170.0,
 'beta_post': 51.0,
 'predictive_n': 170.0,
 'predictive_p': 0.9807692307692307,
 'predictive_mean': 3.3333333333333406}
# Generative modeling: heterogeneous Poisson rates
# Lambda_i ~ Gamma(shape=n, scale=(1-p)/p), then Y_i|Lambda_i ~ Poisson(Lambda_i).
# Marginally Y_i ~ NB(n, p).

n, p = 3.0, 0.4
m = 200_000

lam = rng.gamma(shape=n, scale=(1 - p) / p, size=m)
y_mix = rng.poisson(lam)

# Compare empirical histogram to the NB PMF on a truncation grid
ks, pmf, _ = nbinom_pmf_cdf_trunc(n, p, q=0.999)
counts = np.bincount(y_mix, minlength=int(ks[-1]) + 1)[: len(ks)]
pmf_hat = counts / counts.sum()

fig = go.Figure()
fig.add_trace(go.Bar(x=ks, y=pmf_hat, name="Empirical (Gamma–Poisson)", opacity=0.65))
fig.add_trace(go.Scatter(x=ks, y=pmf, mode="markers+lines", name="NB theory"))
fig.update_layout(
    title="Gamma–Poisson mixture produces a negative binomial count distribution",
    xaxis_title="k",
    yaxis_title="Probability",
)
fig.show()

11) Pitfalls#

  • Parameterization confusion: some sources count successes before failures, or count trials instead of failures. Always confirm what the random variable represents.

  • n as real vs integer: the “wait for n successes” story needs integer n; for real n, interpret via the Gamma–Poisson mixture.

  • Degenerate boundary: p=1 collapses to (K\equiv 0). Values extremely close to 0 or 1 can create numerical headaches.

  • Numerical stability: direct factorial / binomial-coefficient computations overflow quickly. Prefer logpmf with gamma functions (as done here and in SciPy).

  • Truncation: visualizations and numerical sums over (k\in{0,1,2,\dots}) require truncating the tail; check captured mass.

12) Summary#

  • nbinom is a discrete distribution on ({0,1,2,\dots}) modeling failures before n successes.

  • PMF: (inom{k+n-1}{k}(1-p)^k p^n) (gamma-form extends to real n>0).

  • Mean/variance: (\mathbb{E}[K]=n(1-p)/p), (\mathrm{Var}(K)=n(1-p)/p^2) ⇒ overdispersion vs Poisson.

  • Key relationship: Gamma–Poisson mixture ⇔ negative binomial.

  • For computation, prefer log-PMF; for simulation, the Gamma–Poisson sampler is simple and exact.